Text database cleaning by filling the Missing values using Object Oriented Intelligent Multi - Agent System Data Cleaning Architecture
نویسندگان
چکیده
Agents are software programs that perform tasks on behalf of others and they are used to clean the text database with their characteristics. Agents are task oriented with the ability to learn by themselves and they react to the situation. Learning characteristics of an agent is done by verifying its previous experience from its knowledgebase. An agent concept is a complementary approach to the Object Oriented paradigm with respect to the design and implementation of the autonomous entities driven by beliefs, goals and plans. Text database cleaning process detects and cleans the wrong data or duplicates data or missing data by identifying the outliers. Cleaning of Text Databases focuses on incomplete data cleaning. Incomplete data cleaning is performed using the attribute missing rate. Agents incorporated in the architectural design of a Text database cleaning process combines both the features of Multi-Agent System (MAS) Framework and MAS with learning (MAS-L) Framework. MAS framework reduces the development time and the complexity of implementing the software agents. MAS-L framework incorporates the intelligence and learning properties of agents present in the system. MAS-L Framework makes use of the Decision Tree learning and an evaluation function to decide the next best decision that applies to the machine learning technique. This paper proposes the design for Multi-Agent based Data Cleaning Architecture that incorporates the structural design of agents into object model. The Design of an architectural model for Multi-Agent based Data Cleaning inherits the features of the Multi-Agent System (MAS) and uses the MAS-L framework to design the intelligence and learning characteristics. Keywords— Text database, Incomplete data, Agents, MAS, MASL, Architecture, Data Cleaning
منابع مشابه
Data Cleaning: Approaches for Earth Observation Image Information Mining
Actually the growing volume of data provided by different sources some times may present inconsistencies, the data could be incomplete with lack of values or containing aggregate data, noisy containing errors or outliers, etc. Then data cleaning consist in filling missing values, smooth noisy data, identify or remove outliers and resolve inconsistencies. In more general definition, data cleanin...
متن کاملPrivacy-Preserving Imputation of Missing
Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning will be required. In this paper, we address the problem of privacy-preserving data i...
متن کاملPrivacy - preserving imputation of missing data q
Handling missing data is a critical step to ensuring good results in data mining. Like most data mining algorithms, existing privacy-preserving data mining algorithms assume data is complete. In order to maintain privacy in the data mining process while cleaning data, privacy-preserving methods of data cleaning are required. In this paper, we address the problem of privacy-preserving data imput...
متن کاملQOCO: A Query Oriented Data Cleaning System with Oracles
As key decisions are often made based on information contained in a database, it is important for the database to be as complete and correct as possible. For this reason, many data cleaning tools have been developed to automatically resolve inconsistencies in databases. However, data cleaning tools provide only best-effort results and usually cannot eradicate all errors that may exist in a data...
متن کاملData Cleansing during Data Collection from Wireless Sensor Networks
Quality of data in Wireless Sensor Networks (WSNs) is one of the major concerns for many applications. The data quality may drop due to various reasons including the existence of missing values and incorrect values (also known as noisy or corrupt values) that can be caused by factors such as interference and machine malfunctioning. A drop in data quality may seriously impact the performance of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010